Joint interpretation of input speech and pen gestures for multimodal human-computer interaction

نویسندگان

  • Pui-Yu Hui
  • Helen M. Meng
چکیده

This paper describes out initial work in semantic interpretation of multimodal user input that consist of speech and pen gestures. We have designed and collected a multimodal corpus of over a thousand navigational inquiries around the Beijing area. We devised a processing sequence for extracting spoken references from the speech input (perfect transcripts) and interpreting each reference by generating a hypothesis list of possible semantics (i.e. locations). We also devised a processing sequence for interpreting pen gestures (pointing, circling and strokes) and generating a hypothesis list for every gesture. Partial interpretations from individual modalities are combined using Viterbi alignment, which enforces the constraints of temporal order and semantic compatibility constraints in its cost functions to generate an integrated interpretation across modalities for overall input. This approach can correctly interpret over 97% of the 322 multimodal inquiries in our test set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Flexible Speech and Pen Interaction with Handheld Devices

An emerging research direction in the field of pervasive computing is to voice-enable applications on handheld computers. Map-based applications can benefit the most from multimodal interfaces based on speech and pen input and graphics and speech output. However, implementing automatic speech recognition and speech synthesis on handheld computers is constrained by the relatively low computation...

متن کامل

Chapter to appear in Handbook of Human-Computer Interaction, (ed

Multimodal systems process two or more combined user input modes— such as speech, pen, touch, manual gestures, gaze, and head and body movements— in a coordinated manner with multimedia system output. This class of systems represents a new direction for computing, and a paradigm shift away from conventional WIMP interfaces. Since the appearance of Bolt’s (1980) “Put That There” demonstration sy...

متن کامل

Complementarity and redundancy in multimodal user inputs with speech and pen gestures

We present a comparative analysis of multi-modal user inputs with speech and pen gestures, together with their semantically equivalent uni-modal (speech only) counterparts. The multimodal interactions are derived from a corpus collected with a Pocket PC emulator in the context of navigation around Beijing. We devise a cross-modality integration methodology that interprets a multi-modal input an...

متن کامل

On Multimodal Route Navigation in Pdas

One of the biggest obstacles in building versatile natural human-computer interaction systems is that the recognition of natural speech is still not sufficiently robust, especially in mobile situations where it's almost impossible to cancel out all irrelevant auditory information. In multimodal systems the possibility to disambiguate between several input and output modalities can substantially...

متن کامل

An Overview of Multimodal Interaction Techniques and Applications

introDUCtion Desktop multimedia (multimedia personal computers) dates from the early 1970s. At that time, the enabling force behind multimedia was the emergence of the new digital technologies in the form of digital text, sound, animation, photography , and, more recently, video. Nowadays, multimedia systems mostly are concerned with the compression and transmission of data over networks, large...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006